<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
  <title>Dueling DQN - Reinforcement Learning Coach Documentation</title>
  

  <link rel="shortcut icon" href="../../../img/favicon.ico">

  
  <link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>

  <link rel="stylesheet" href="../../../css/theme.css" type="text/css" />
  <link rel="stylesheet" href="../../../css/theme_extra.css" type="text/css" />
  <link rel="stylesheet" href="../../../css/highlight.css">
  <link href="../../../extra.css" rel="stylesheet">

  
  <script>
    // Current page data
    var mkdocs_page_name = "Dueling DQN";
  </script>
  
  <script src="../../../js/jquery-2.1.1.min.js"></script>
  <script src="../../../js/modernizr-2.8.3.min.js"></script>
  <script type="text/javascript" src="../../../js/highlight.pack.js"></script>
  <script src="../../../js/theme.js"></script> 
  <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"></script>

  
</head>

<body class="wy-body-for-nav" role="document">

  <div class="wy-grid-for-nav">

    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
      <div class="wy-side-nav-search">
        <a href="../../.." class="icon icon-home"> Reinforcement Learning Coach Documentation</a>
        <div role="search">
  <form id ="rtd-search-form" class="wy-form" action="../../../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
  </form>
</div>
      </div>

      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
        <ul class="current">
          
            <li>
    <li class="toctree-l1 ">
        <a class="" href="../../..">Home</a>
        
    </li>
<li>
          
            <li>
    <li class="toctree-l1 ">
        <a class="" href="../../../design/index.html">Design</a>
        
    </li>
<li>
          
            <li>
    <li class="toctree-l1 ">
        <a class="" href="../../../usage/index.html">Usage</a>
        
    </li>
<li>
          
            <li>
    <ul class="subnav">
    <li><span>Algorithms</span></li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../dqn/index.html">DQN</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../double_dqn/index.html">Double DQN</a>
        
    </li>

        
            
    <li class="toctree-l1 current">
        <a class="current" href="./index.html">Dueling DQN</a>
        
            <ul>
            
                <li class="toctree-l3"><a href="#dueling-dqn">Dueling DQN</a></li>
                
                    <li><a class="toctree-l4" href="#network-structure">Network Structure</a></li>
                
                    <li><a class="toctree-l4" href="#general-description">General Description</a></li>
                
            
            </ul>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../categorical_dqn/index.html">Categorical DQN</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../mmc/index.html">Mixed Monte Carlo</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../pal/index.html">Persistent Advantage Learning</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../nec/index.html">Neural Episodic Control</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../bs_dqn/index.html">Bootstrapped DQN</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../n_step/index.html">N-Step Q Learning</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../naf/index.html">Normalized Advantage Functions</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../policy_optimization/pg/index.html">Policy Gradient</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../policy_optimization/ac/index.html">Actor-Critic</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../policy_optimization/ddpg/index.html">Deep Determinstic Policy Gradients</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../policy_optimization/ppo/index.html">Proximal Policy Optimization</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../policy_optimization/cppo/index.html">Clipped Proximal Policy Optimization</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../other/dfp/index.html">Direct Future Prediction</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../imitation/bc/index.html">Behavioral Cloning</a>
        
    </li>

        
    </ul>
<li>
          
            <li>
    <li class="toctree-l1 ">
        <a class="" href="../../../dashboard/index.html">Coach Dashboard</a>
        
    </li>
<li>
          
            <li>
    <ul class="subnav">
    <li><span>Contributing</span></li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../../contributing/add_agent/index.html">Adding a New Agent</a>
        
    </li>

        
            
    <li class="toctree-l1 ">
        <a class="" href="../../../contributing/add_env/index.html">Adding a New Environment</a>
        
    </li>

        
    </ul>
<li>
          
        </ul>
      </div>
      &nbsp;
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
        <a href="../../..">Reinforcement Learning Coach Documentation</a>
      </nav>

      
      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="breadcrumbs navigation">
  <ul class="wy-breadcrumbs">
    <li><a href="../../..">Docs</a> &raquo;</li>
    
      
        
          <li>Algorithms &raquo;</li>
        
      
    
    <li>Dueling DQN</li>
    <li class="wy-breadcrumbs-aside">
      
    </li>
  </ul>
  <hr/>
</div>
          <div role="main">
            <div class="section">
              
                <h1 id="dueling-dqn">Dueling DQN</h1>
<p><strong>Actions space:</strong> Discrete</p>
<p><strong>References:</strong> <a href="https://arxiv.org/abs/1511.06581">Dueling Network Architectures for Deep Reinforcement Learning</a></p>
<h2 id="network-structure">Network Structure</h2>
<p style="text-align: center;">

<img src="..\..\design_imgs\dueling_dqn.png">

</p>

<h2 id="general-description">General Description</h2>
<p>Dueling DQN presents a change in the network structure comparing to DQN.</p>
<p>Dueling DQN uses a specialized <em>Dueling Q Head</em> in order to separate <script type="math/tex"> Q </script> to an <script type="math/tex"> A </script> (advantage) stream and a <script type="math/tex"> V </script> stream. Adding this type of structure to the network head allows the network to better differentiate actions from one another, and significantly improves the learning.</p>
<p>In many states, the values of the different actions are very similar, and it is less important which action to take.
This is especially important in environments where there are many actions to choose from. In DQN, on each training iteration, for each of the states in the batch, we update the <script type="math/tex">Q</script> values only for the specific actions taken in those states. This results in slower learning as we do not learn the <script type="math/tex">Q</script> values for actions that were not taken yet. On dueling architecture, on the other hand, learning is faster - as we start learning the state-value even if only a single action has been taken at this state.</p>
              
            </div>
          </div>
          <footer>
  
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
      
        <a href="../categorical_dqn/index.html" class="btn btn-neutral float-right" title="Categorical DQN"/>Next <span class="icon icon-circle-arrow-right"></span></a>
      
      
        <a href="../double_dqn/index.html" class="btn btn-neutral" title="Double DQN"><span class="icon icon-circle-arrow-left"></span> Previous</a>
      
    </div>
  

  <hr/>

  <div role="contentinfo">
    <!-- Copyright etc -->
    
  </div>

  Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
	  
        </div>
      </div>

    </section>

  </div>

<div class="rst-versions" role="note" style="cursor: pointer">
    <span class="rst-current-version" data-toggle="rst-current-version">
      
      
        <span><a href="../double_dqn/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
      
      
        <span style="margin-left: 15px"><a href="../categorical_dqn/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
      
    </span>
</div>

</body>
</html>
